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ABSTRACT 

Motivated by recent observational studies of the environment of 2: ~ 6 QSOs, we have used 
the Millennium Run (MR) simulations to construct a very large (~ 4° x 4°) mock redshift 
survey of star-forming galaxies at z 6. We use this simulated survey to study the relation 
between density enhancements in the distribution of 1775 -dropouts and Lya emitters, and their 
relation to the most massive halos and protocluster regions at z ^ 6. Our simulation predicts 
significant variations in surface density across the sky with some voids and filaments extend- 
ing over scales of 1°, much larger than probed by current surveys. Approximately one third 
of all z 6 halos hosting i-dropouts brighter than z=26.5 mag (k, M^y ^^q) become part 
of z = galaxy clusters, i-dropouts associated with protocluster regions are found in regions 
where the surface density is enhanced on scales ranging from a few to several tens of arc min- 
utes on the sky. We analyze two structures of i-dropouts and Lya emitters observed with the 
Subaru Telescope and show that these structures must be the seeds of massive clusters-in- 
formation. In striking contrast, six z 6 QSO fields observed with HST show no significant 
enhancements in their i775-dropout number counts. With the present data, we cannot rule out 
the QSOs being hosted by the most massive halos. However, neither can we confirm this 
widely used assumption. We conclude by giving detailed recommendations for the interpreta- 
tion and planning of observations by current and future ground- and space based instruments 
that will shed new light on questions related to the large-scale structure at z '--^ 6. 

Key words: cosmology: observations - early universe - large-scale structure of universe - 
theory - galaxies: high-redshift - galaxies: clusters: general - galaxies: starburst. 



1 INTRODUCTION 

During the first decade of the third Millennium we have begun to 
put observational constraints on the status quo of ga laxy formation 
at roughly one billion years after the Big B ang (e.g. Stanway et al.l 
2003; Yan& Windhorst"2004a; Bouwens et al. 2003,'2004a', 2006'; 
Dickinson et al..2004,;,Malhotra et aL2005.;.Shimasaku et al.,2005i: 



Ouchi et al l2005l ;l lOverzier et alj200^ . Statistical samples of star- 
forming galaxies at z = 6 - either selected on the basis of their 
large (i-z) color due to the Lyman break redshifted to z ~ 6 (i- 
dropouts), or on the basis of the large equivalent width of Lya 
emission (Lya emitters) - suggest that they are analogous to the 
pop ulation of Lyman brea k galaxies (LBGs) found at 2 ~ 3 — 5 
(e.g. lBouwens et al.l2 007^. A small subset of the ^775 -dropouts has 
been found to be surprisingly massive or old Id ow-Hygelund et al.l 
l2005l ; lYan et"ai]|2006l ; lEyles et al.ll2007h . The slope of the UV lu- 
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minosity function at z = 6 is very steep and implies that low lumi- 
nosity objects contributed significantly to reionizing the Universe 
(Yan & Windhorst 2004b; Bouwens et al. 2007; Khochfar et al] 
l2007l ; IOverzier et al.ll2008ai) . Cosmological hydrodynamic simula- 
tions are being used to reproduce the abundances as well as the 
spectral energy distributions of z = 6 galaxies. Exactly how these 
objects are conn ected to loc al galaxies remains a highly active area 
of res earch fe.gjDave et al.|i2006; Gavler Harford & Gnedin 200i 



iRobertson et al.ll2007h 



Na gamine et al.l2006ll2o68l ; lNight et al.ll2006l ; rFinlator et alj20o' 



The discov ery of highly luminous quasi-s t ellar ob j ects (QSOs) 
at z ^ 6 (e.g lFanetalJl200ll l2003l |2004 l2006al ; iGotol I2OO6I; 
IVenemans et al.l 120071) is of equal importance in our understand- 
ing o£_the_^omiation of the first massive black holes and galax- 
ies. |Gunn|&£eterso3 ([196^ absorption troughs in their spectra de- 
marcate the end of the epoch of reionization (e.g. Fan et alj|200ll ; 
IWhite et al.|[2003l ; IWalter et al.ll2004l; [Fan et alj|2006bh . Assuming 
that high redshift QSOs are radiating near the Eddington limit. 
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they cont ain supermassive b l ack holes (SMBHs) of mass ^ 10^ 
Mg, (e.g. Iwillott et alj|2003|; iBarth et al.ll2003l ; IVestergaardl2004 
I Jiang et al j2007l : lKurk et al.l2007h . The spectral properties of most 
3 ~ 6 QSOs in the rest-frame UV, optical, IR and X-ray are similar 
to those at low redshift, suggesting that massive, and highly chemi- 
cally enriched galaxies were vigorously forming stars and SM BHs 
less than one billion years after the Big Bang (e.g. B ertoldi et al.l 
l2003l: iMaiolino et al." 20051: Ijiang et alj|2006l : IWang et alj|2007h . 

Hierarchical formation models and simulations can repro- 
duce the existence of such massive objects at early times (e.) 



Haiman & Loeb'200lVSpringel et al."2005a';'Begelman et al.'200( 
Volonteri & Rees 2006; Li et al. 2007; Narayanan et al. 2007), pro- 
vided however that they are situated in extremely massive halos. 
Large-scale gravitational clustering is a powerful method for esti- 
mating halo masses of quasars at low redshifts, but cannot be ap- 
plied to z ~ 6 QSOs because there are too few systems known. 
Their extremely low space density determined from the Sloan Dig- 
ital Sky Survey (SDSS) of ~1 Gpc~^ (co moving) implies a (max- 
imum ) halo mass of Mhaio ~ lO" M© jpan et alJl200ll ; lLi et al.l 
l2007h . A similar halo mass is obtained when extrapolating from 
th e {z = 0) rel ationship between black hole mass and b ulge mass 
of Magorrian et al. ( 1998), and using Q,m /^bar ^ 10 dPan et al.l 
l200lh . Because the descendants of the most massive halos at z ~ 6 
may evolve into halos of > 10^^ at 2 = in a ACDM model. 



(e.g. Springel et al. 2005a; ISuwa et alj2006l:rLi et alj|200l but see 
iDe Lucia & BlaizotI <2007l) , lTrenti et al.l ll200g^ and Sect. [5] of this 
paper), it is believed that the QSOs trace highly biased regions that 
may give birth to the most massive present-day galaxy clusters. If 
this is true, the small-scale environment of z ~ 6 QSOs may be ex- 
pected to show a significant enhancement in the number of small, 
faint galaxies. These galaxies may either merge with the QSO host 
galaxy, or may form the first stars and black holes of other (proto- 
)cluster galaxies. 

Observations carried out with the Advanced Camera for Sur- 
veys (ACS) on the Hubble Space Telescope (HST), allowed a 
rough measurement of the two-dimensional overdensities of faint 
^775 -dopouts detec ted towards the QSOs J0836-I-00 54 at z = 5.8 
dZhenget al J 120061) and J1 030-H0524 at 2 = 6.28 jStiavelli et all 
l2005h Recently iKim et al. IJioos) presented results from a sample 
of 5 QSO fields, finding some to be overdense and some to be un- 
derdense with respect to the HST/ ACS Great Observatories Origins 
Deep Survey (GOODS). IPriddev et al.l ( l2007l) find enhancements in 
the number counts of sub-mm galaxies. Substantial overdensities 
of i-dropout s and Lyg emitters hav e also been found in non-QSO 
fields (e.g. IShimasaku eTal] |2003| ; lOuchi et alj |2005| ; lota et al.l 
l2008l) . suggesting that massive structures do not always harbour 
a QSO, which may be explained by invoking a QSO duty-cycle. At 
2 2 — 5, significant excesses of star-for ming g alaxies have been 
found near QSOs (e.g. Djorgovski et al. 2003; Kashikawa e t al.l 
2007h. radio galaxies (e.g. Milev et al...2004 ; Venemans et al. 20071 ; 



Overzier et alll2008bt) , and in random fields jSteidel et alj|l998 
20051) . Although the physical interpretation of the measurements 



is uncertain, these structures are believed to be associated with the 
formation of clusters of galaxies. 

The idea of verifying the presence of massive structures 
at high redshift through the clustering o f small galaxies around 
them has recently been explored by, e.g., iMufioz & Loeb ('2008a') 
using the excursion set formalism of halo growth (^entner 2007). 
However, the direct comparison between models or simulations 
and observations remains difficult, mainly because of complicated 
observational selection effects. This is especially true at high 
redshift. In order to investigate how a wide variety of galaxy 
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Figure 1. Redshift versus the (co-moving) X-coordiiiate for all objects 
within a slice of width Ay = 250 h^^ Mpc along the y-axis. 



overdensities found in surveys at 2: ~ 2 — 6 are related to cluster 
formation, we have carried out an analysis of the progenitors of 
galaxy clusters in a set of cosmological TV-body simulations. Our 
results will be presented in a se ries of papers. In Pap er I, we use 
the Milennium Run Simulations jSpringel et alj2005ij) to simulate 
a large mock survey of galaxies at z ~ 6 to derive predictions for 
the properties of the progenitors of massive galaxy clusters, paying 
particular attention to the details of observational selection effects. 
We will try to answer the following questions: 

(i) Where do we find the present-day descendants of the i- 
dropouts? 

(ii) What are the typical structures traced by /-dropouts and 
Lyof emitters in current surveys, and how do they relate to proto- 
clusters? 

(iii) How do we unify the (lack of excess) number counts 
observed in QSO fields with the notion that QSOs are hosted by 
the most massive halos at z ~ 6? 

The structure of the present paper is as follows. We describe 
the simulations, and construction of our mock i-dropout survey in 
Section 2. Using these simulations, we proceed to address the main 
questions outlined above in Sections 3-5. We conclude the paper 
with a discussion (Section 6), an overview of recommendations for 
future observations (Section 7), and a short summary (Section 8) of 
the main results. 



2 SIMULATIONS 

2.1 Simulation description 

We use the semi-analytic galaxy catalogues that ar e based on the 
Millenn ium Run (MR) dark matter simulation of ISpringel et al.l 
( l2005al) . Detailed descriptions of the simulations and the semi- 
analytic modeling have been covered extensively elsewhere, 
and we kindly refer the reader t o tho se works for more in- 

]1999': 



formation (e.g. Kauffma nn et al.l 11 999: Springel et al. 2005£ 
Croton et alj2006t;.Lemson & SpringeL2006. ; .De Lucia et al..2004 



De Lucia & Blaizotll2007l and references therein) 

The dark matter simulation was performed with the cosmolog- 
ical simulation code GADGPT-2 ( Springel 2005b), and consisted 
of 2160^ particles of mass 8.6 X 10* h-^ Mq in a periodic box of 
500 h^^ Mpc on a side. The simulations followed the gravitational 
growth as traced by these particles from z = 127 to 2 = in a 
ACDM cosmology (!^,„ = 0.25, = 0.75, h = 0.73, n = 1 
as — 0.9) consistent with the WMAP year 1 data jSpergel et al.l 
l2003h . The results were stored at 64 epochs ("snapshots"), which 
were used to construct a detailed halo merger tree during postpro- 
cessing, by identifying all resolved dark matter halos and subhalos, 
and linking all progenitors and descendants of each halo. 
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Figure 2. The simulation box sliowing tlie positions in co-moving coordi- 
nates of ail objects identified as ^775 -dropout galaxies to 2;850=27.0 mag. 

Galaxies were modeled by applying semi-analytic prescrip- 
tions of galaxy formation to the stored halo merger trees. The tech- 
niques and recipes include gas cooling, star formation, reionizaton 
heating, supernova feedback, black hole growth, and "radio-mode" 
feedback from gal axies w i th a st atic hot gas atmosphere, and are 
described in Croto n et al. I ( l2006h . The photometric properties of 
galaxies are then modeled using stellar population synthesis mod- 
els, including a si mple dust model. Here we u se the updated models 
'delucia2006a' of iDe Lucia & Blaizoll j2007h that have been made 
publicly available through an advanced database structure on the 
MR websitj^ (.Lemson & Virgo Consortium.2006.) . 



2.2 Construction of a large mock survey at 2: 6 

We used the discrete MR snapshots to create a large, continous 
mock survey of 1775 -dropout galaxies at 2; 6. The general prin- 
ciple of transforming a series of discrete snapshots into a mock 
pencil beam survey entails placing a virtual observer somewhere in 
the simulation box at 2 = and carving out all galaxies as they 
would be observed in a pencil beam survey along that observer's 
line of sight. This technique has been described in great detail in 
iBlaizot et all {2005} and Kitzbichler & White ( 2007). In general, 
one starts with the snapshot i = 63 at 2 = and records the po- 
sitions, velocities and physical properties of all galaxies within the 
cone out to a comoving distance corresponding to that of the next 
snapshot. For the next segment of the cone, one then use the proper- 
ties as recorded in snapshot i = 62, and so on. The procedure relies 
on the reasonable assumption that the large-scale structure (posi- 
tions and velocities of galaxies) evolves relatively slowly between 
snapshots. By replicating the simulation box along the lightcone 
and limiting the opening angle of the cone, one can in principle 
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construct unique lightcones out to very high redshift without cross- 
ing any region in the simulation box more than once. The method 
is straightforward when done in comoving coordin ates in a flat 
cosm ology using simple Euclidean geometry ( Kitzbichler & Whit3 
l2007h . 

Because the comoving distances or redshifts of galaxies 
recorded at a particular snapshot do not correspond exactly to their 
effective position along the lightcone, we need to correct their mag- 
nitudes by interpolating over redshift as follows: 

M^orlzid)] = M{z,) + ^[zid) ~ 2,)], (1) 

where Mcor[z{d)] is the observer-frame absolute magnitude at the 
observed redshift, z{d) (including peculiar velocities along the line 
of sight), M{zi) is the magnitude at redshift Zi corresponding to 
the ith snapshot, and dM/dz is the first order derivative of the 
observer-frame absolute magnitude. The latter quantity is calcu- 
lated for each galaxy by placing it at neighbouring snapsh ots, and 
ensur es that the A"-correction is taken into account (Blai zot et al.l 
I2OO5I) . Finally , we apply the mean attenuation of the intergalactic 
medium using iMadaul d 19951) and calculate the observer-frame ap- 
parent magnitudes in each filter. 

In this paper, we use the fact that the selection of 2 ~ 6 galax- 
ies through the i-dropout technique is largely free of contamination 
from objects at lower (and higher) redshift dBouwens et alj|2006l) 
provided that the observations are deep enough. Because the trans- 
verse size of the MR simulation box (500 Mpc) corresponds 
to a comoving volume between z ~ 5.6 and z ~ 7.3 (the typical 
redshift range of i-dropouts surveys) we can use the three simula- 
tion snapshots centered at2 = 5.7, z = 6.2 and z = 6.7 to create a 
mock survey spanning this volume, while safely neglecting objects 
at other redshifts. 

We extracted galaxies from the MR database by selecting the 
axis of the simulation box to lie along the line-of-sight of our 
mock field. In order to compare with the deepest current surveys, 
we calculated the apparent magnitudes in the HST/ACS Veoe, 1775 
and 2850 filters and the 2MASS J, H and Ks filters. We derived 
observed redshifts from the comoving distance along the line of 
sight (including the effect of peculiar velocities), applied the K- 
corrections and IGM absorption, and calculated the apparent mag- 
nitudes in each band. Fig.[T]shows the spatial X-coordinate versus 
the redshift of objects in the simulated lightcone. Fig. |2] shows the 
entire simulated volume projected along the Z- or redshift axis. 
These figures show that there exists significant filamentary and 
strongly clustered substructure at 2 ~ 6, both parallel and per- 
pendicular to the line-of-sight. 

Our final mock survey has a comoving volume of ~ 0.3 Gpc"^, 
and spans an area of 4.4° x 4.4° when projected onto the sky. It 
contains ~ 1.6 x IC galaxies at z — 5.6 — 7.3 with 2 ^ 27.5 mag 
(corresponding to an absolute magnitud^l of Muv,ab — —19.2 
mag, about one mag below A/^y 2=6)- For comparison and future 
reference, we list the main i-dropout surveys together with their 
areal coverage and detection limit in Table[T] 

2.3 Colour-colour selection 

In the left panel of Fig. [3] we show the Veoe — 2850 vs. 1775 — 2850 
colour-colour diagram for all objects satisfying 2 ^ 27.0. The i- 

^ The rest-frame absolute magnitude at 1350A is defined as M » ~ 

1350A 

- 51ogio(rfL/10pc) + 2.51ogi„(l + z) 
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Figure 3. Colour-colour diagra ms of the MR mock t775 - dropout survey. To guide the eye we have indicated tracks showing the colours of a 100 Myr old 
continuous starburst model from' Bruzual & CharloJ i2003h for different amounts of reddening in E{B — V) of 0.0 (blue), 0.2 (green), and 0.4 (red). Redshifts 
are indicated along the zero-reddening track. Only objects at z > 5.6 are included in the simulations, as 4775 -dropouts surveys have been demonstrated to 
have very little contamination (see text for details). 



dropouts populate a region in colour-colour space that is effectively 
isolated from lower redshift objects using a simple colour cut of 
4775 — 2850 ^ 1-3 — 1.5. Note that although our simulat ed survey 
only contains objects at z > 5.6, it has been shown ( Stanway et al.l 
l2003l : lDickinson et alJl2004lBouwens et al.ll2004j. 12006) that this 
colour cut is an efficient selection criterion for isolating starburst 
galaxies at z ~ 6 with blue zgso — J colours (see right panel of Fig. 

For reference, we have overplotted colour tracks for a 100 Myr 
old, continuous star formation model as a function of redshift; dif- 
ferent colour curves show results for different amounts of redden- 
ing by dust. As can be seen, these simple models span the region 
of colour-colour space occupied by the MR galaxies. At 2 < 6, 
galaxies occupy a tight sequence in the plane. At z > 6, objects 
fan out because the Veoe — 2:850 colour changes strongly as a func- 
tion of redshift, while the 1775 — 2350 colour is more sensitive to 
both age and dust reddening. Because of the possibility of intrin- 
sically red interlopers at z ~ 1 — 3, the additional requirement of 
a non-detection in Veoe, or a very red Veoe — -2850 ^ 3 colour, if 
available, is often included in the selectioifl Because the selection 
based on 4775 — zgso > 1.3 introduces a s mall bias against ob- 
jects havi ng strong Lya emission at z < 6 (Malhotra et al. '2005"; 
IStanwav'e t al. 2007), we have statistically included the effect of 
Lya on our sample selection by randomly assigning Lya with a 
rest-frame equivalent width of 30A to 25% of the galaxies in our 
volume, and recalculating the 1775 — Z850 colours. The inclusion of 
Lyof leads to a reduction in the number of objects selected of ^ 3% 
(see also iBouwens et al,.2006,) . 



2.4 i-dropout number densities 

In Table [2] we list the surface densities of 1775 -dropouts selected 
in the MR mock survey as a function of limiting zgso -magnitude 
and field size. For comparison, we calculated the surface densi- 
ties for regions having areas comparable to some of the main 1775- 
dropout surveys: the SDF (876 arcmin^), two GOODS fields (320 
arcmin^), a single GOODS field (160 arcmin'^), and a HUDF-sized 
field (11.2 arcmin'^). The errors in Table [2] indicate the ±\a de- 
viation measured among a large number of similarly sized fields 
selected from the mock survey, and can be taken as an estimate 
of the influence of (projected) large-scale structure on the number 
counts (usually referred to as "cosmic variance" or "field-to-field 
variations"). At faint magnitudes, the strongest observational con- 
straints on the 1775 -dropout density come from the HST surveys. 
Our values for a GOODS-sized survey are 105, 55 and 82% of the 
values given by the most recent estimates by B07 for limiting zg5o 
magnitudes of 27.5, 26.5 and 26.0 mag, respectively, and consistent 
within the expected cosmic variance allowed by our mock survey. 
Because the total area surveyed by B06 is about 200 x smaller than 
our mock survey, we also compare our results to the much larger 
SDF from lOta et all j2008l) . At 2 = 26.5 mag the number densities 
of ~0. 18 arcmin"'^ derived from both the real and mock surveys 
are perfectly consistent. 

Last, we note that in order to achieve agreement between the 
observed and simulated number counts at z ^ 6, we did not req uire 
any tweaks to either the cosmology (e.g., see lWang et aI.|[20o3 , for 
the effect of different WMAP cosmologies), or the dust model used 
(see|Kitzbichler & White 2007; Guo &. White 2008, for alternative 
dust models better tuned to high redshift galaxies). This issue may 
be further investigated in a future paper. 



^ The current paper uses magnitudes and colours defined in the HST/ACS 
^606*775-2850 filter system in order to compare with the deepest surveys 
available in literature. Other works based on groundbased data commonly 
use the SDSS-based r'i' z' filterset, but the differences in colours are mini- 
mal. 



2.5 Redshift distribution 

In Fig.|4]we show the redshift distribution of the full mock survey 
(thick solid line), along with various subsamples selected according 
to different 1775-2850 colour cuts that we will refer to later on in this 
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Figure 4. Redshift histograms derived from the MR mock j-dropout survey 
at the depth of 285o=27.5 mag using the selection criteria i775-2850> 1-3 
(thick sohd line, error bars indicate the Icr scatter expected among GOODS- 
sized fields), «775-2850> 1-5 (dashed line), i775-2g5o< 2.0 (blue line), 
and 4775-2850 > 2.0 (red fine). The thin solid line indicates the model red- 
shift distribution from B06 based on the HUDF. 
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Figure 5. Physical properties of i-dropouts in the MR mock survey satisfy- 
ing z ^ 26.5 mag. We plot the cumulative fractions of galaxies with stel- 
lar masses, star formation rates, stellar ages and halo masses greater than 
a given value. Top left: Distribution of stellar masses. The median stellar 
mass is 4 X lO'' Mq (dotted line). Top right: Distribution of SFRs. 
The median SFR is ~ 30 Mq yr~^ (dotted line). Bottom left: Distribution 
of mass-weighted ages. The median age is ~150 Myr (dotted fine). Bottom 
right: Distribution of halo masses. The median halo mass is ^ 2 X 10^^ 
Af0 h~l (dotted line). 



paper. The standard selection of 1775-2850 > 1-3 results in a distri- 
bution that peaks at z ~ 5.8. We have also indicated the expected 
scatter resulting from cosmic variance on the scale of GOODS- 
sized fields (error bars are la). Some, mostly groundbased, studies 
make use of a more stringent cut of 1775-2:850 > 1-5 to reduce the 
chance of foreground interlopers (dashed histogram). Other works 
have used colour cuts of 1775-^850;$ 2 (blue histogram) and 4775- 
2850^ 2 (red histogram) in order to try to extract subsamples at 
z < 6 and z > 6, respectively. As can be seen in Fig. [4] such 
cuts are indeed successful at broadly separating sources from the 
two redshift ranges, although the separation is not perfectly clean 
due to the mixed effects of age, dust and redshift on the 1775-^850 
colour. For reference, we have also indicated the model redshift 
distribution from B06 (thin solid line). This redshift distribution 
was derived for a much fainter sample of 285o;$29 mag, which ex- 
plains in part the discrepancy in the counts at z > 6.2. Evolution 
across the redshift range will furthermore skew th e actual redshift 
distrib ution toward lower values (see discussion in lMunoz & Loebl 
l2008 b). This is not included in the B06 model, and its effect is only 
marginally taken into account in the MR mock survey due to the 
relatively sparse snapshot sampling across the redshift range. Un- 
fortunately, the exact shape of the redshift distribut ion is currently 
not v ery well constrained by spectroscopic samples jMalhotra et al.l 
1200^ . A more detailed analysis is beyond the scope of this pa- 
per, and we conclude by noting that the results presented below are 
largely independent of the exact shape of the distribution. 

2.6 Physical properties of i-dropouts 

Although a detailed study of the successes and failures in the semi- 
analytic modeling of galaxies at 2: ~ 6 is not the purpose of our 
investigation, we believe it will be instructive for the reader if we at 



least summarize the main physical properties of the model galax- 
ies in our mock survey. Unless stated otherwise, throughout this 
paper we will limit our investigations to i-dropout samples having 
a limiting magnitude of 2:850=26 .5 ma£l comparable to Alijy at 
z — 6 (see lBouwens et ai] |2007l) . This magnitude typically corre- 
sponds to model galaxies situated in dark matter halos of at least 
100 dark matter particles (~ lO" Mq h'^). This ensures that 
the evolution of those halos and their galaxies has been traced for 
some time prior to the snapshot from which the galaxy was se- 
lected. In this way, we ensure that the physical quantities derived 
from the semi-analytic model are relatively stable against snapshot- 
to-snapshot fluctuations. A magnitude limit of 2850=26.5 mag also 
conveniently corresponds to the typical depth that can be achieved 
in deep groundbased surveys or relatively shallow HST-based sur- 
veys. 

In Fig. [5] we plot the cumulative distributions of the stellar 
mass (top left), SFR (top right), and stellar age (bottom left) of 
the 2775 -dropouts in the mock survey. The median stellar mass is 
~ 5 X 10^ Mq h~^, and about 30% of galaxies have a stellar 
mass greater than 10^° Mq. The median SFR and age are ~ 30 
Mq yi~^ and ~160 Myr, respectively, with extrema of ~ 500 
Mq yr~^ and ~400 Myr. These results are in general agreement 
with several studies based on modeling the stellar populations of 
limited samples of 1775 -dropouts and Lya emitters for which deep 



^ For reference: a 2850 magnitude of ~26.5 mag for an unattenuated 
galaxy at 2 ~ 6 would correspond to a SFR of ~7 Mq yr~^, under the 
widely used assumption of a 0.1-125AfQ Salpeter initial mass function and 
the conversion factor between SFR and the rest-frame 1500A UV luminos- 
ity of 8.0 X 10^^ erg s~^ Hz~^ / Mq yr~^ as given by iMadau et alj 
il998l) . 
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Figure 6. Number density versus halo mass of the z = dark matter 
halos hosting descendants of i-dropouts at z ~ 6 to a limiting depth of 
■^850 ^26.5 mag. The median j-dropout descendant halo mass is a few times 
10^3 Mq (dotted hne). The halo mass function of all MR halos at z = is 
indicated by the dashed line. The mass range occupied by the halos associ- 
ated with galaxy clusters is indicated by the hatched region. 



observations with HST and Spitzer exist. lYan et al.l ^200^ have an- 
alyzed a statistically robust sample and find stellar masses ranging 
from ~ 1 X 10® Mq for IRAC-undetected sources to ~ 7 x lO"^ 
Mq for the brightest 3.6/im sources, and ages ranging from <40 
to 400 Myr (se e also lDow-Hvgelund et a l. 2005; Evles et al. 2007; 
iLai et alj[2007l . for additional comparison data). We also point out 
that the maximum stellar mass of ~ 7 X 10^" A/0 found m our 
mock survey (see top left panel) is comparable to the most massive 
i-dropouts found, and that "supermassive" galaxies having masses 
in excess o f > 10^^ Mq are ab sent in both the simulations and ob- 
servations jMcLure et alj2006l) . Last, in the bottom right panel we 
show the distribution of the masses of the halos hosting the model 
z-dropouts. The median halo mass is ~ 3 X 10" Mq Our re sults 
are in the ran ge of values reported by * Overzier et alj j200 d) and 
iMcLure et al.l ( 120081) based on the angular correlation function of 
large 1775 -dropout samples, but we note that halo masses are cur- 
rently not very well constrained by the observations. 



3 THE RELATION BETWEEN J-DROPOUTS AND 
(PROTO-)CLUSTERS 

In this Section we study the relation between local overdensities in 
the i-dropout distribution at z ~ 6 and the sites of cluster forma- 
tion. Throughout this paper, a galaxy cluster is defined as being all 
galaxies belonging to a bound dark matter halo having a dark mat- 
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Figure 7. Number density versus stellai' mass of the galaxies at 2 = that 
have at least one i-dropout progenitor at z ^ 6. The median descendant 
mass is ^ 10^^ Mq (dotted line). The distribution of stellar mass of all 
MR z = galaxies is indicated for comparison (dashed line). 



ter mas^ of Mtophat ^ 10^'' 
find 2,832 unique halos, or galaxy clusters, fulfilling this condition. 



Mq at z = 0. In the MR we 



21 of which can be considered supermassive {M 



tophat 



> 10" 



h~ Mq). Furthermore, a proto-cluster galaxy is defined as being 
a galaxy at ~ 6 that will end up in a galaxy cluster at 2 = 0. Note 
that these are trivial definitions given the database structure of the 
Millennium Run simulations, in which galaxies and halos at any 
given redshift ca n be related to their progenitors and descendants at 
another redshift jLemson & Virgo Consortiunill2006l) . 



3.1 Properties of the z = descendants of i-dropouts 

In Fig. |6]we plot the distribution of number densities of the cen- 
tral halos that host the 2 = descendants of the i-dropouts in our 
mock survey as a function of the halo mass. The median halo mass 
hosting the i-dropout descendants at 2 = is 3 x 10^'' Mq h~^ 
(dotted line). For comparison we indicate the mass distribution of 
all halos at z = (dashed line). The plot shows that the fraction 
of halos that can be traced back to a halo hosting an i-dropout at 
z ~ 6 is a strong function of the halo mass at 2: = 0. 45% of 
all cluster-sized halos at 2 = (indicated by the hatched region) 
are related to the descendants of halos hosting i-dropouts in our 
mock survey, and 77% of all clusters at 2 = having a mass of 
A^> 7 X 10^'* Mq can be traced back to at least one progen- 
itor halo at 2 ~ 6 hosting an i-dropout. This implies that the first 
seeds of galaxy clusters are already present at 2 ~ 6. In addition, 
many i-dropout galaxies and their halos may merge and end up in 
the same descendant structures at 2 = 0, which was not accounted 



^ The 'tophat' mass, Mtophat^ is the mass within the radius at which the 
halo has an overdensity co rresponding to the value at virialisation in the 
top-hat collapse model (see lwhitel200ll) . 
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Figure 8. Projected distribution on tlie sky of tlie z ~ 6 j-dropouts selected from the MR mock survey according to the criteria 1775-2:850 > 1.3 and z 26.5 
mag (small and large points). Contours indicate regions of equal density, defined as 5S^/ = (E / — S^/ ) /S,/ , S / and S^/ being the local and mean surface 
density measured in circular cells of 5 radius. Over- and underdense regions of = ±[0.25, 0.5, 1.0] are shown in red and blue contours, respectively. 
The mean density (<5Sg/ = 0) is indicated by the green dashed contour. Large black points mark proto-cluster galaxies that end up in galaxy clusters at z = 0. 



for in our calculation above where we only counted unique halos at 
2 = 0. In fact, about ~34% (~2%) of all i-dropouts (2850^26.5) 
in the mock survey will end up in clusters of mass > 1 x 10^* 
(> 7 X 10^'') A/q h"^ at 2 = 0. This implies that roughly one 
third of all galaxies one observes in a typical i-dropout survey can 
be considered "proto-cluster" galaxies. The plot further shows that 
the majority of halos hosting i-dropouts at 2: ~ 6 will evolve into 
halos that are more typical of the group environment. This is sim- 
ilar to the situa tion found for Lym an break or dropout galaxies at 
lower redshifts jOuchi et al.ll2004) . 

In Fig.|7]we plot the stellar mass distribution of those 2 = 
galaxies that host the descendants of the i-dropouts. The present- 
day descendants are found in galaxies having a wide range of stel- 
lar masses (A^, ~ lO''"^^ Mq), but the distribution is skewed 
towards the most massive galaxies in the MR simulations. The me- 



dian stellar mass of the descendants is ~ 10^^ Mq (dotted line in 
Fig.|7}. 

3.2 Detecting proto-clusters at 2 ~ 6 

We will now focus on to what extent local overdensities in the i- 
dropout distribution at 2 ~ 6 may trace the progenitor seeds of the 
richest clusters of galaxies in the present-day Universe. In Fig.[8]we 
plot the sky distribution of the i-dropouts in our 4.4° x 4.4° MR 
mock survey (large and small circles). Large circles indicate those 
i-dropouts identified as proto-cluster galaxies. We have plotted con- 
tours of i-dropout surface density, 5e,5' = (S5' — E5')/E5', E^/ 
and Eg/ being the local and mean surface density measured in cir- 
cular cells of 5' radius. Negative contours representing underdense 
regions are indicated by blue lines, while positive contours rep- 
resenting overdense regions are indicated by red lines. The green 
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Figure 9. Counts-in-cells frequency distribution of the j-dropouts shown in 
Fig.[8] based on 20,000 randomly placed ACS-sized fields of 3.4' X 3.4'. 
The panel on the right shows a zoomed-in view to give a better sense of 
the small fraction of pointings having large numbers of i-dropouts. In both 
panels, the thick solid line indicates the frequency distribution of the full 
MR mock survey. The dashed line indicates how the distribution changes if 
we "disrupt" all protocluster regions of Fig.[8]by randomizing the positions 
of the galaxies marked as proto-cluster galaxy. The dotted line indicates 
the frequency distribution of a large sample of i-dropouts selected from the 
GOODS survey by B06 using identical selection criteria. Thin solid lines 
indicate the Poisson distribution for a mean of 2 i-dropouts per pointing. 




dashed lines indicate the mean density. The distribution of proto- 
cluster galaxies (large circles) correlates strongly with positive en- 
hancements in the local i-dropout density distribution, indicating 
that these are the sites of formation of some of the first clusters. In 
Fig. [9] we plot the frequency distribution of the i-dropouts shown 
in Fig. [8] based on a counts-in-cells analysis of 20,000 randomly 
placed ACS-sized fields of 3.4' x 3.4' (solid histograms). On aver- 
age, one expects to find about 2 i-dropouts in a random ACS point- 
ing down to a depth of zg5o=26.5, but the distribution is skewed 
with respect to a pure Poissonian distribution as expected due to 
the effects of gravitational clustering. The Poissonian expectation 
for a mean of 2 i-dropouts is indicated by a thin line for compari- 
son. The panel on the right shows a zoomed-in view to give a better 
sense of the small fraction of pointings having large numbers of 
i-dropouts. Also in Fig.|9]we have indicated the counts histogram 
derived from a similar analysis performed on i-dropouts extracted 
from the GOODS survey using the samples of B06. The GOODS 
result is indicated by the dotted histogram, showing that it lies much 
closer to the Poisson expectation than the MR mock survey. This is 
of course expected as our mock survey covers an area over 200 x 
larger than GOODS and includes a much wider range of environ- 
ments. To illustrate that the (small) fraction of pointings with the 
largest number of objects is largely due to the presence of regions 
associated with proto-clusters, we effectively "disrupt" all proto- 
clusters by randomizing the positions of all protocluster galaxies 
and repeat the counts-in-cells calculation. The result is shown by 
the dashed histograms in Fig.|9] The excess counts have largely dis- 
appeared, indicating that they were indeed due to the proto-clusters. 
The counts still show a small excess over the Poissonian distribu- 
tion due to the overall angular clustering of the i-dropout popula- 
tion. 

We can use our counts-in-cells analysis to predict the cumu- 
lative probability, P>f , of randomly finding an i-dropout overden- 
sity equal or larger than 5t,,acs- The resuls are shown in Fig. 1101 
The four panels correspond to the subsamples defined using the 
four different 1775-2850 colour cuts (see §2.5 and Fig.|4l(. Panel in- 
sets show the full probability range for reference. The figure shows 
that the probability of finding, for example, cells having a surface 



Figure 10. Panels show the cumulative probability distributions of finding 
regions having a surface overdensity > 5^, ACS of i-dropouts for the four 
samples extracted from the MR mock survey based on colour cuts of i-j-j^- 
^850> 1.3 (top-left), i775--2850> 1.5 (top-right), 1.3 <i775-Z850< 2.0 
(bottom-left), and i775-2:850> 2.0 (bottom-right). The inset plots show 
the full probability distributions. Dashed, coloured lines indicate the joint 
probability of finding cells having an overdensity > St.. ACS and those 
cells consisting of at least 25% (blue), 50% (green) and 75% proto-cluster 
galaxies. 



overdensity of i-dropouts of >3 is about half a percent for the 1775- 
-2850 > 1 .3 samples (top left panel, solid line). The other panels show 
the dependence of Pys on i-dropout samples selected using differ- 
ent colour cuts. As the relative contribution from for- and back- 
ground galaxies changes, the density contrast between real, physi- 
cal overdensities on small scales and the "field" is increased. 

The results presented in Fig. [TO] provide us with a powerful 
way to interpret many observational findings. Specifically, over- 
densities of i-dropouts have been interpreted as evidence for large- 
scale structure associated with proto-clusters, at least qualitatively. 
Although Fig.[TO]tells us the likelihood of finding a given overden- 
sity, this is not sufficient by itself to answer the question whether 
that overdensity is related to a proto-cluster due to a combination of 
several effects. First, because we are mainly working with photo- 
metrically selected samples consisting of galaxies spanning about 
one unit redshift, projection effects are bound to give rise to a range 
of surface densities. Second, the number counts may show signifi- 
cant variations as a function of position and environment resulting 
from the large-scale structure. The uncertainties in the cosmic vari- 
ance can be reduced by observing fields that are larger than the 
typical scale length of the large-scale structures, but this is often 
not achieved in typical observations at 2: ~ 6. Third, surface over- 
densities that are related to genuine overdensities in physical coor- 
dinates are not necessarily due to proto-clusters, as we have shown 
that the descendents of i-dropouts can be found in a wide range 
of environments at 2 = 0, galaxy groups being the most common 
(see Fig.[6ll. We have separated the contribution of these effects to 



Galaxies, protoclusters and quasars at z 6 9 



o 

c 

.2 

c 
Q 



-15 



■15 



■15 



• 



-15 



% - ' ■ V ■ ■ ° ' 

' 'S* o a. 

^ t," O o » 

^ » 1 1 


" '0 DO* 


1 @ 

1 f m ° 1 f " 


* ° ° " oQ, J!' 


°jf ° *^ 
=• = » • . 

% ;^^°°/"* ° 




■ ' ° o - ^ . ° ^ 

^ , ^ « . . 
/l ^, 1 n> "l' , 


l.-.>eM=14.8 6a*,4° 

• * n 

=\5c»' fl-; ^ ' 

° * <iS 'is 

"i- * * °r. . 

„ ^ cr 


IP * ^ □ f 

, aw i • ^ D «i J 9 1 


5 ' 


;)3 ^4*^ • rD=6 
^ V 'Tl./ . • ; 


-a- 1 ,ni . . ^ 

7.j0.07^= „=^ja.7^^ 

Do . Q 

^ 

□ jn 1 


a 

J.:^'M=14.7 = 


l«.rfiVW4.5'' S^3-7J' < 

,'•■8 a 

° « "0 " 


D(,Oj, D «> a 


: ° - 



-15 -15 -15 -15 +15 

Right Ascension (arcmiti) 

Figure 12. Panels show the angular distribution of 2775 -dropouts in 30'x30' areas centered on each of 16 protoclusters associated with overdensities 
65:, ACS ^ 3. Field galaxies are drawn as open circles, and cluster galaxies as filled circles that are colour coded according to their cluster membership. 
The ACS field-of-view is indicated by a red square. Numbers near the top of each panel indicate the ID, redshift, overdensity and cluster mass (at 2 = 0) of 
the protoclusters in the center of each panel. 



Pys from that due to proto-clusters by calculating the fraction of 
actual proto-cluster i-dropouts in each cell of given overdensity S. 
The results are also shown in Fig.[TOl where dashed histograms in- 
dicate the combined probability of finding a cell of overdensity ^ 5 
consisting of more than 25 (blue lines), 50 (green lines) and 75% 
(red lines) protocluster galaxies. The results show that while, for 
example, the chance of P{S ^ 2.5) is about 1%, the chance that at 
least 50% of the galaxies in such cells are proto-cluster galaxies is 
only half that, about 0.5% (see top left panel in Fig.llOb. The figure 
goes on to show that the fractions of protocluster galaxies increases 
significantly as the overdensity increases, indicating that the largest 
(and rarest) overdensities in the i-dropout distribution are related to 
the richest proto-cluster regions. This is further illustrated in Fig. 
II llin which we plot the average and scatter of the fraction of proto- 
cluster galaxies as a function of 5. Although the fraction rises as 
the overdensity increases, there is a very large scatter. At 5 ~ 4 
the average fraction of protocluster galaxies is about 0.5, but varies 
significantly between 0.25 and 0.75 (Icr). 



It will be virtually impossible to estimate an accurate cluster 
mass at 2 = from a measured surface overdensity at z ~ 6. Al- 
though there is a correlation between cluster mass at 2 = and 
i-dropout overdensity at z ~ 6, the scatter is significant. Many of 
the most massive (A^>10^^ Mq) clusters have very small associ- 
ated overdensities, while the progenitors of fairly low mass clusters 
(X~10" Mq) can be found associated with regions of relatively 
large overdensities. However, the largest overdensities are consis- 
tently associated with the progenitors of 5 x 10^'' — 1 x 10^^ 
Mq clusters. 



3.3 Some examples 

Although the above sections yield useful statistical results, it is in- 
teresting to look at the detailed angular and redshift distributions 
of the Z775 -dropouts in a few of the overdense regions. In Fig. 1 121 
we show 16 30'x30' regions having overdensities ranging from 
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Figure 14. Panels show redshift versus one of the angular coordinates of 4775 -dropouts for each of the protocluster regions shown in Fig. 1121 Field galaxies 
are drawn as open circles, and cluster galaxies as filled circles that are colour coded according to their cluster membership as in Fig. 1121 Red dashed lines mark 
z = 5.9, which roughly corresponds to, respectively, the upper and lower redshift of samples selected by placing a cut at 4775-2:850^52 and 4775-^850^2. 
Blue dotted lines mark the redshift range (Az 0.1) probed by narrowband Lya filters. 



Ss,ACS ~ 8 (bottom left panel) to 3 (top right panel). In each 
panel we indicate the relative size of an ACS pointing (red square), 
and the redshift, overdensity and present-day mass of the most mas- 
sive protoclusters are given in the top left and right corners. Field 
galaxies are drawn as open circles, while protocluster galaxies are 
drawn as filled circles. Galaxies belonging to the same proto-cluster 
are drawn in the same colour. While some regions contain rela- 
tively compact protoclusters with numerous members inside the 
3.4' X 3.4' ACS field-of-view (e.g. panels #0, 1 and 8), other regions 
may contain very few or highly dispersed galaxies. Also, many 
regions contain several overlapping protoclusters as the selection 
function is sensitive to structures from a relatively wide range in 
redshift inside the 30'x30' regions plotted. Although the angular 
separation between galaxies belonging to the same protocluster is 
typically smaller than ~10' or 25 Mpc (comoving), Fig.ll3lshows 
that the overdensities of regions centered on the protoclusters are 
significantly positive out to much larger radii of between 10 to 30', 



indicating that the protoclusters form inside very large filaments of 
up to 100 Mpc in size that contribute significantly to the overall 
(field) number counts in the protocluster regions. In Fig.[l4]we plot 
the redshift coordinate against one of the angular coordinates using 
the same regions and colour codings as in Fig. [T2] Protoclusters 
are significantly more clumped in redshift space compared to field 
galaxies, due to flattening of the velocity field associated with the 
collapse of large structures. In each panel, a red dashed line marks 
z = 5.9, which roughly corresponds to, respectively, the upper and 
lower redshift of samples selected by placing a cut at 1775-2850^2 
and 2775-^850^2 (see the redshift selection functions in Fig. |4]l. 
Such colour cuts may help reduce the contribution from field galax- 
ies by about 50%, depending on the redshift one is interested in. We 
also mark the typical redshift range of Az ~ 0. 1 probed by narrow- 
band filters centered on the redshift of each protocluster using blue 
dotted lines. As we will show in more detail in i]4.2| below, such 
narrowband selections offer one of the most promising methods for 
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Figure 11. The average fraction of j-dropouts marked as proto-cluster 
galaxies contained in ACS-sized cells as a function of cell overdensity. Er- 
ror bars are Icr. There is a clear trend showing that larger surface overdensi- 
ties are associated with a larger contribution from galaxies in proto-clusters, 
albeit with significant scatter. 
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Figure 13. Lines show overdensity as a function of radius for each of the 
protocluster regions shown in Fig. 1121 



finding and studying the earliest collapsing structures at high red- 
shift, because of the significant increase in contrast between cluster 
and field regions. However, such surveys are time-consuming and 
only probe the part of the galaxy population that is bright in the 
Lya line. 



4 COMPARISON WITH OBSERVATIONS FROM THE 
LITERATURE 

Our mock survey of i-dropouts constructed from the MR, due to 
its large effective volume, spans a wide range of environments and 
is therefore ideal for making detailed comparisons with observa- 
tional studies of the large-scale structure at 2 6. In the following 
subsections, we will make such comparisons with two studies of 



candidate proto-clusters of i-dropouts and Lya emitters found in 
the SDF and SXDF. 



4.1 The candidate proto-cluster of lOta et alj j2008h 

When analysing the sky distribution of i- dropouts in the 876 
arcmin Subaru Deep Field, Ota et al. ' l l2008h (henceforward O08) 
discovered a large surface overdensity, presumed to be a proto- 
cluster at 2: ~ 6. The magnitude of the overdensity was quantified 
as the excess of i-dropouts in a circle of 20 Mpc comoving radius. 
The region had 5e,20Mpc = 0.63 with 3(j significance. Further- 
more, this region also contained the highest density contrast mea- 
sured in a 8 Mpc comoving radius 5e,8Mpc = 3.6 (5a) compared 
to other regions of the SDF. By relating the total overdensity in dark 
matter to the measured overdensity in galaxies through an estimate 
of the galaxy bias parameter, the authors estimated a mass for the 
proto-cluster region of ~ 1 x 10^^ Mq. 

We use our mock survey to select i-dropouts with 1775-2850 > 
1.5 and 285o< 26.5, similar to O08. The resulting surface density 
was 0.16 arcmin"^ in very good agreement with the value of 0.18 
arcmin^^ found by O08. In Fig. [15] we plot the sky distribution 
of our sample, and connect regions of constant (positive) density 
fe,20Mpc- Next we selected all regions that had 5e,20Mpc ^ 0.63. 
These regions are indicated by the large red circles in Fig. \T5\ 
We find ~30 (non-overlapping) regions in our entire mock survey 
having (5e,20Mpc = 0.6 — 2.0 at 2 — 7ct significance, relative to 
the mean dropout density of E20Mpc ~ 32. Analogous to Fig. [8] 
we have marked all i-dropouts associated with proto-clusters with 
large symbols. It can be seen clearly that the proto-cluster galaxies 
are found almost exclusively inside the regions of enhanced local 
surface density indicated by the contour lines, while the large void 
regions are virtually depleted of proto-cluster galaxies. Although 
the 30 regions of highest overdensity selected to be similar to the 
region found by O08 coincide with the highest peaks in the global 
density distribution across the field, it is interesting to point out 
that in some cases the regions contain very few actual proto-cluster 
galaxies, e.g., the regions at (RA,DEC)=(10,150) and (80,220) in 
Fig. [15] We therefore introduce a proto-cluster "purity" parameter, 
T^pc, defined as the ratio of galaxies in a (projected) region that be- 
long to protoclusters to the total number of galaxies in that region. 
We find 7^pc,20Mpc ~ 16-50%. The purest or richest proto-clusters 
are found in regions having a wide range in overdensities, e.g., the 
region at (175,225) with (5e,20Mpc = 2.2, 7^pc,20Mpc =50%), and 
the region at (200,40) with <5e,20Mpc = 0.9, 7lpc,20Mpc =40%. 
Following O08 we also calculate the maximum overdensity in each 
region using cells of 8 Mpc radius. We find 5e,8Mpc = 1-1 — 3.5 
with 2 — 6(T significance. These sub-regions are indicated in Fig. 
[15] using smaller circles. Interestingly, there is a very wide range 
in proto-cluster purity of 7?,pc,8Mpc ~0-80%. The largest overden- 
sity in Fig.[T5]at (175,225) corresponds to the region giving birth 
to the most massive cluster. By 2 = 0, this region has grown into a 
"supercluster" region containing numerous clusters, two of which 
have M > 10^^ Mq. 

We conclude that local overdensities in the distribution of i- 
dropouts on scales of ~10-50 comoving Mpc similar to the one 
found by O08 indeed trace the seeds of massive clusters. Because 
our mock survey is about 80 x larger than the SDF, we expect that 
one would encounter such proto-cluster regions in about one in 
three (2.7) SDF-sized fields on average. However, the fraction of 
actual proto-cluster galaxies is in the range 16-50% (0-80% for 8 
Mpc radius regions). This implies that while one can indeed find the 
overdense regions where clusters are likely to form, there is no way 
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Figure IS. The sk y distribution o f i-dropouts selected using criteria 
matched to those of lOta et"al] i2008l) . Grey solid lines are surface density 
contours of 5e,20Mpc = 0, +0.2, +0.4, +0.6, +0.8 and +1.0. Large red 
dashed circles mark overdense regions of 5e,20Mpc > 0.63, correspond- 
ing to similar overdensities a s that asso ciated with the candidate z ^ 6 
proto-cluster region found bv lOtaetalJ ilOOS) in the Subaru Deep Field. 
Small red circles inside each region mark a subregion having the largest 
overdensity 5e,8Mpc measured in a 8 Mpc co-moving radius (projected) 
cell (see text for further details). 

of verifying which galaxies are part of the proto-cluster and which 
are not, at least not when using photometrically selected samples. 
These results are consistent with our earlier finding that there is a 
large scatter in the relation between the measured surface overden- 
sity and both cluster "purity" and the mass of its descendant cluster 
at 2 = (Sect.lXa. 

4.2 The LyQ-selected proto-cluster of lOuchi et^ jlOOSh 

The addition of velocity information gives studies of Lya sam- 
ples a powerful edge over purely photometrica l ly sel ected 1775- 
dropout samples. As explained by [Monaco et ai] j2005l . and refer- 
ences therein), peculiar velocity fields are influenced by the large- 
scale structure: streaming motions can shift the overall distribution 
in redshift, while the dispersion can both increase and decrease as a 
result of velocity gradients. Galaxies located in different structures 
that are not physically bound will have higher velocity dispersions, 
while galaxies that are in the process of falling together to form 
non-linear structures such as a filaments, sheets (or "pancakes") 
and proto-clusters will have lower velocity dispersions. 

Using deep n arrow-band imaging observations of the SXDF, 
lOuchi et al.l ilOO^ (O05) were able to select Lya candidate galax- 
ies at 2 ~ 5.7 ± 0.05. Follow-up spectroscopy of the candidates 
in one region that was found to be significantly overdense {5 > 3) 
on a scale of 8 Mpc (comoving) radius resulted in the discovery 
of two groups ('A' and 'B') of Lya emitting galaxies each having 
a very narrow velocity dispersion of < 200 km s~^. The three- 
dimensional density contrast is on the order of ~ 100, comparable 



Figure 16. Mock Lya survey at ~ 5.8 ± 0.05 constructed from the MR 
mock i775-dropout sample. Grey solid lines are surface density contours 
of i5e.20Mdc = —0 .25 to 3.25 with a step increase of 0.5 as in Fig. 2 
of lOuchi et al.l j2005). The black dashed line marks the average field den- 
sity. Small circles indicate field galaxies. Large circles indicate protocluster 
galaxies. 

to that of present-day clusters, and the space density of such proto- 
cluster regions is roughly consistent with that of massive clusters 
(see O05). 

In order to study the velocity fields of collapsing structures 
and carry out a direct comparison with O05, we construct a simple 
Lya survey from our mock sample as follows. First, we construct 
a (Gaussian) redshift selection function centred on z = 5.8 with 
a standard deviation of 0.04. As it is not known what causes some 
galaxies to be bright in Lya and others not, our simulations do not 
include a physical prescription for Lya as such. However, empir- 
ical results suggest that Lya emitters are mostly young, relatively 
dust-free objects and a subset of the 4775 -dropout population. The 
fraction of galaxies with high equivalent width Lya is about 40%, 
and this fraction is found to be roughly constant as a function of 
the rest-frame UV continuum magnitude. Therefore, we scale our 
selection function so that it has a peak selection efficiency of 40%. 
Next, we apply this selection function to the i775-dropouts from the 
mock survey to create a sample with a redshift distribution similar 
to that expected from a narrowband Lya survey. Finally, we tune 
the limiting zsso magnitude until we find a number density that is 
similar to that reported by O05. By setting 2:85o<26.9 mag we get 
the desired number density of ~0. 1 arcmin^^. The mock Lya field 
is shown in Fig.|16l 

In the top left panel of Fig. [17] we plot the overdensities mea- 
sured in randomly drawn regions of 8 Mpc (comoving) radius 
against the protocluster purity parameter, analogous to Fig.[TT] Al- 
though the median purity of a sample increases with overdensity 
(dashed line), the scatter indicated by the points is very large even 
for overdensities as large as 5 ~ 3 found by O05 (marked by the 
shaded region in the top panel of Fig.|17t. To guide the eye, we have 
plotted regions of purity >0.5 as red points, and regions having pu- 
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Figure 17. The correlations between surface overdensity, cluster purity and 
velocity dispersion for Lya galaxies selected from the mock Lya sui'vey 
shown in Fig. ll6l using randomly drawn cells of 8 Mpc (comoving) radius. 
Dashed lines indicate the median trends. Red points highlight regions of pu- 
rity TZ >0.5. Blue points highlight regions of TZ >0.5 and <5s > 3. Shaded 
areas mark the values obtained bv lOuchi et alj j2005h for a protocluster of 
Lya galaxies in the SDR See text for details. 



rity >0.5 and 5 > 3 as blue points in all panels of Fig. [TT] Next, 
we calculate the velocity dispersion, av,bi, from the peculiar veloc- 
ities of the galaxie s in each region using the bi-weight estimator of 
iBeers et alfliggoh that is robust for relatively small numbers of ob- 
jects (A^ ~ 10 — 50), and plot the result against 5 and cluster purity 
in the top right and bottom left panels of Fig.[T7l respectively. 

Although gravitational clumping of galaxies in redshift space 
causes the velocity dispersions to be considerably lower than the 
velocity width allowed by the bandpass of the narrowband filter 
({o"u,6i) — 1000 km compared to ctjvb ~ 1800 km s^^ for 
o'NB,z = 0.04), the velocity dispersion is not a decreasing func- 
tion of the overdensity (at least not up to 5 ~ 3 — 4) and the scatter 
is significant. This can be explained by the fact that proto-clusters 
regions are rare, and even regions that are relatively overdense in 
angular space still contain many galaxies that are not contained 
within a single bound structure. A much stronger correlation is 
found between dispersion and cluster purity (see bottom left panel 
of Fig.llVt. Although the scatter in dispersion is large for regions 
with a purity of < 0.5, the smallest dispersions are associated with 
some of the richest protocluster regions. This can be understood 
because the "purest" structures represent the bound inner cores of 
future clusters al z — 0. The velocity dispersions are low because 
these systems do not contain many field galaxies that act to inflate 
the velocity dispersion measurements. Therefore, the velocity dis- 
persion correlates much more strongly with the protocluster purity 
than with the surface overdensity. The overdensity parameter helps, 
however, in reducing some of the ambiguity in the cluster richness 
at small dispersions (compare black and blue points at small av,bi 
in the bottom left panel). The shaded regions in Fig.[T7]indicate the 
range of measurements of O05, implying that their structure has the 
characteristics of Lya galaxies falling into a protocluster at z ~ 6. 



5 WHERE IS THE LARGE-SCALE STRUCTURE 
ASSOCIATED WITH Z ~ 6 QSOS? 

For reasons explained in the Introduction, it is generally assumed 
that the luminous QSOs at jz ~ 6 inhabit the most massive dark 
matter in the early Universe. The HST/ACS, with its deep and 
wide-field imaging capabilities in the 4775 and zgso bands, has 
made it possible to test one possible implication of this by searching 
for small neighbouring galaxies tracing these massive halos. In this 
Section, we will first investigate what new constraint we can put 
on the m asses of the host halos based on the observed neighbour 
statistics. iMunoz & Loebl j2008a l have addressed the same prob- 
lem based on the excursion set formalism. Our analysis is based 
on semi-analytic models incorporated in the MR simulation, which 
we believe is likely to provide a more realistic description of galaxy 
properties at z; ~ 6. We will use the simulations to evaluate what 
we can say about the most likely environment of the QSOs and 
whether they are associated with proto-clusters. We finish the Sec- 
tion by presenting some clear examples from the simulations that 
would signal a massive overdensity in future observations. 

Several searches for companion galaxies in the vicinity of 
2: ~ 6 QSOs have been carried out to date. In Table [T] we list 
the main surveys, covering in total 6 QSOs spanning the red- 
shift range 5.8 < z < 6.4. We have used the results given in 
IStiavelli et all j2005l) . lzheng et alj ( |200^ and lKim et al.1 ( |2008|) to 
calculate the surface overdensities associated with each of the QSO 
fields listed in Tabled Only two QSOs were found to be associated 
with positive overdensities to a limiting magnitude of 2:850=26.5: 
J0836-I-0054!! (z = 5.82) and 11030-^0524 {z = 6.28) both had 
&,ACS ~ 1, although evidence suggests that the overdensity could 
be as high as « 2 — 3 when taking into account subclustering 
within the ACS field or sources se l ected using different S/N or 
colour cuts (see'Stiavelli et al."200^; IZheng et al."2006l: lAiiki et al.l 
2006; Kim et al. 2008, for details). The remaining four QSO fields 
(J1306-H0356 at z = 5.99, J1630-H4012 at z = 6.05, 11048-I-4637 
at 2 = 6.23, and J1148-I-5251 at 2: = 6.43) were all consistent 
with having no excess counts with fe.ACS spanning the range from 
about — 1 to +0.5 relatively independent of the method of selection 
( iKim et alj|2008h . Focusing on the two overdense QSO fields. Fig. 
[To] tells us that overdensities of (5e,acs ^ 1 are fairly common, 
occurring at a rate of about 17% in our 4° x 4° simulation. The 
probability of finding a random field with St, > 2 — 3 is about 5 
to 1%. It is evident that none of the six quasar fields have highly 
significant overdensities. The case for overdensities near the QSOs 
would strengthen if all fields showed a systematically higher, even 
if moderate, surface density. However, when considering the sam- 
ple as a whole the surface densities of i-dropouts near 2 ~ 6 QSOs 
are fairly average, given that four of the QSO fields have lower or 
similar number counts compared to the field. With the exception 
perhaps of the field towards the highest redshift QSO Jl 148-1-5251, 
which lies at a redshift where the j-dropout selection is particularly 
inefficient (see Fig.|4ll, the lack of evidence for substantial (surface) 
overdensities in the QSO fields is puzzling. 

In Fig. [T8]we have plotted the number of 1775 -dropouts en- 
countered in cubic regions of 20 x 20 x 20 Mpc against the 
mass of the most massive dark matter halo found in each region. 
Panels on the left and on the right are for limiting magnitudes 



The si gnificance of the o verdensity in this field is less than originally 
stated bv lzheng et all j2006l) as a result of underestimating the contamina- 
tion rate when a Veoe image is not available to reject lower redshift inter- 
lopers. 
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Figure 18. Panels show the number of neighbours (1775 -dropouts) in cu- 
bic regions of (20 Mpc^ versus the mass of the most massive halo 
found in each of those regions. Top and bottom panels are for snapshots at 
z = 5.7 and 2 = 6.2, respectively. Left and right panels are for neighbour 
counts down to limiting magnitudes of 2g5o=27.5 (left) and 2350= 26.5 
mag, respectively. There is a wide dispersion in the number of neighbours, 
even for the most massive halos at 2 ~ 6. The highest numbers of neigh- 
bours are exclusively associated with the massive end of the halo mass func- 
tion, allowing one to derive a lower limit for the mass of the most massive 
halo for a given number of neighbour counts. The scatter in the number of 
neighbours versus the mass of the most massive halo reduces signficantly 
when going to fainter magnitudes. The small squares in the panels on the 
light con'espond to the three richest regions (in terms of 2350 <26. 5 mag 
dropouts) that are shown in close-up in Fig. 1191 



of 2:850=27.5 and 26.5 mag, respectively. Because the most mas- 
sive halos are so rare, here we have used the full MR snapshots at 
2; = 5.7 (top panels) and 2 = 6.2 (bottom panels) rather than the 
lightcone in order to improve the statistics. There is a systematic 
increase in the number of neighbours with increasing maximum 
halo mass. However, the scatter is very large: for example, focus- 
ing on the neighbour count prediction for z — 5.7 and 2350 <26. 5 
(top right panel) we see that the number of neighbours of a halo of 
10^^ Mq can be anywhere between and 20, and some of the 
most massive halos of 10^"^ Mq have a relatively low number 
of counts compared to some of the halos of significant lower mass 
that are far more numerous. However, for a given 285o<26.5 neigh- 
bour count (in a 20 x 20 x 20 h^^ Mpc region) of > 5, the halo 
mass is always above ~ 10^^'^ Mq, and if one would observe 
> 25 4775 -dropout counts one could conclude that that field must 
contain a supermassive halo of > 10^^'^ Mq. Thus, in princi- 
ple, one can only estimate a lower limit on the maximum halo mass 
as a function of the neighbour counts. The left panel shows that the 
scatter is much reduced if we are able to count galaxies to a lim- 
iting 2g5o-band magnitude of 27.5 instead of 26.5, simply because 
the Poisson noise is greatly reduced. 

We can therefore conclude that the relatively average num- 
ber of counts observed in the QSO fields is not inconsistent with 
the QSOs being hosted by very massive halos. However, one could 



make an equally strong case that they are, in fact, not. If we trans- 
late our results of Fig.[T8]to the QSO fields that cover a smaller pro- 
jected area of (~ 5 h^^)^ Mpc^, and we add back in the average 
counts from the fore- and background provided by our lightcone 
data, we estimate that for QSOs at 2 ~ 5.7 we require an overden- 
sity of Ss.ACS ~ 4 in order to be able to put a lower limit on the 
QSO host mass of ~ 10^^ Mq, while a fe.ACS ~ 1 is consistent 
with ~ 10^^ Mq. At 2 = 6.2, we would require fe.ACS 2 for 
M > 10^^ Mq and fe.ACS > 1 for M > lO"''^ Mq. Com- 
parison with the relatively low surface overdensities observed thus 
suggests that the halo mass is uncontrained by the current data. 
Nonetheless, we can at least conclude quite firmly that the QSOs 
are in far less rich environments (in terms of galaxy neigbours) 
compared to many rich regions found both in the simulations and in 
some of the deep field surveys described in the previous Section. In 
order to get a feel for what the QSO fields might have looked like 
if they were in highly overdense regions, we present some close- 
up views in Fig. [19] of the three richest regions of 2g5o<26.5 mag 
1775 -dropouts as marked by the small squares in Fig.[T8l The cen- 
tral position corresponding to that of the most massive halo in each 
region is indicated by the green square. Large and small dots cor- 
respond to dropout galaxies having 2350 <26. 5 and <27.5 mag, re- 
spectively. For reference, we use blue circles to indicate galaxies 
that have been identified as part of a protocluster structure. The 
scale bar at the top left in each panel corresponds to the size of an 
ACSAVFC pointing used to observe 2 ~ 6 QSO fields. We make 
a number of interesting observations. First, using the current ob- 
servational limits on depth (2g5o=26.5 mag) and field size (3.4', 
see scale bar) imposed by the ACS observations of QSOs, it would 
actually be quite easy to miss some of these structures as they typ- 
ically span a larger region of 2-3 ACS fields in diameter. Going 
too much fainter magnitudes would help considerably, but this is at 
present unfeasible. Note, also, that in three of the panels presented 
here, the galaxy associated with the massive central halo does not 
pass our magnitude limits. It is missed due to dust obscuration as- 
sociated with very high star formation rates inside these halos, im- 
plying that they will be missed by large-area UV searches as well 
(unless, of course, they also host a luminous, unobscured QSO). 

Finally, we investigate the level of mutual correspondence be- 
tween the most massive halos selected at 2 = 6 and 2 = 0. In 
Fig. [19] we already saw that the richest regions are associated with 
a very large number of galaxies that will become part of a cluster 
when evolved to 2 = 0. In the top row of Fig. |20] we show the 
mass of the most massive progenitors at 2 = 5.7 (left), 2 — 6.2 
(middle) ad 2 — 6.7 (right) of halos selected at 2 = (see also 
iTrenti et alj|2008l) . The dotted line indicates the threshold corre- 
sponding to massive galaxy clusters at 2 = 0. Although the pro- 
genitor mass increases systematically with increasing local mass, 
the dispersion in the mass of the most massive 2 ~ 6 progenitors is 
about or over one order of magnitude, and this is true even for the 
most massive clusters. As explained in detail bv lTrenti et al.l ( l200^ 
this observation leads to an interesting complication when using the 
refinement technique often used to simulate the most massive re- 
gions in the early Universe by resimulating at high redshift the most 
massive region identified at 2 = in a coarse grid simulation. In 
the bottom panels of Fig.[20]we show the inverse relation between 
the most massive halos selected at 2 ~ 6 and their most massive 
descendant at 2 = 0. From this it becomes clear that eventhough 
the most massive 2 ~ 6 halos (e.g. those associated with QSOs) are 
most likely to end up in present-day clusters, some evolve into only 
modest local structures more compatible with, e.g., galaxy groups. 
This implies that the present-day descendants of some of the first. 



Galaxies, protoclusters and quasars at z 6 15 



55. 
s 





235 240 




Figure 19. Close-up views of three (20 h^^)'^ Mpc^ regions that were found to be highly overdense in 2g5o<26.5 mag iyTs-dropouts as marked by the 
squares in Fig.fTs] The top row of panels correspond to the three richest regions found at z = 5.7, while the bottom row corresponds to those at z = 6.2. The 
position of the most massive halo in each region is indicated by a green square. Large and small dots correspond to dropout galaxies having Z85o<26.5 and 
<27.5 mag, respectively. Galaxies that have been identified as part of a protocluster structure are indicated by blue circles. The scale bar at the top left in each 
panel corresponds to the size of an ACSAVFC pointing used to observe z ~ 6 QSO fields. Note that the galaxy corresponding to the most massive halo as 
indicated by the green square is not always detected in our 1775 -dropout survey due to dust obscuration associated with very high star formation rates. 



massive galaxies and supermassive black holes must be sought in 
sub-cluster environments. 



6 DISCUSSION 

Although our findings of the previous Section show that the appar- 
ent lack of excess neighbour counts near 2: ~ 6 QSOs is not incon- 
sistent with them being hosted by supermassive dark matter halos 
as suggested by their low co-moving abundance and large inferred 
black hole mass, it is interesting to note that none of the QSO fields 
have densities that would place them amongst the richest structures 
in the z ~ 6 Universe. This leads to an intriguing question: where 

is the large-scale structure associated with QSOs? 

One possibility that has been discussed (e.g. lKim et al.ll20o3) 
is that while the dark matter density near the QSOs is significantly 
higher compared to other fields, the strong ionizing radiation from 
the QSO may prohibit the condensation of gas thereby suppressing 
galaxy formation. Although it is not clear how important such feed- 
back processes are exactly, we have found that proto-clusters in the 
MR form inside density enhancements that can extend up to many 
tens of Mpc in size. Although we do not currently know whether 
the 2; ~ 6 QSOs might be associated with overdensities on scales 
larger than a few arcminutes as probed by the ACS, it is unlikely 



that the QSO ionization field will suppress the formation of galax- 
ies on such large scales (Wvithe et al. 2005). An alternative, per- 
haps more likely, explanation for the deficit of 1775 -dropouts near 
QSOs, is that t he dark matter halo m ass of the QSOs is being greatly 
overestimated. IWillottetalJ ( l2005h suggest that the combination of 
the steepness of the halo mass function for rare high redshift ha- 
los on one hand, combined with the sizeable intrinsic scatter in 
the correlation between black hole mass and stellar velocity dis- 
persion or halo mass at low redshift on the other hand, makes it 
much more probable that a 10® Mq black hole is found in rela- 
tively low mass halos than in a very rare halo of extremely high 
mass. Depending on the exact value of the scatter, the typical mass 
of a halo hosting a 2: ~ 6 QSO may be reduced by ~ 0.5 — 1.5 in 
log Mhaio without breaking the low redshift Ai-a-^ relation. The 
net result is that QSOs occur in some subset of halos found in sub- 
stantially less dense environments, which may explain the obser- 
vations. This notion seems to be confirmed by the low dynamical 
mass of ~ 5 X 10^° Mq estimated for the inner few kpc region 
of SPSS Jl 148-1-52 51 at 2 = 6.43 based on the CO fine emission 
dWalter et al.l2004l) . This is in complete contradiction to the ~ 10 
AIq stellar mass bulges and ~ 10^^ Mq mass halos derived based 
on other arguments. If true, models should then explain why the 
number density of such Q SOs is as observed. O n the other hand, 
recent theoretical work by IPiikstra et al. I ( I2OO8I) suggests that in 
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Figure 20. The correspondence between the most massive halos selected at 2 = and z = 6 (see also lTrenti et alj2008h . In the top row of panels we plot the 
mass of the most massive progrenitor of halos selected at 2 = for snapshots at z = 5.7 (left), z = 6.2 (middle) and z = 6.7 (right). In the bottom row of 
panels we plot the mass of the most massive 2 = descendant for halos selected at 2 = 5.7 (left), z = 6.2 (middle) and z = 6.7 (right). In all panels the 
dotted line indicates the mass corresponding to the threshold we use to define clusters at 2 = (M ^ 10^** Mq Mpc). The dispersion in the mass of 
the most massive 2 6 progenitors of 2 = clusters is over an order of magnitude. Conversely, the most massive halos present at 2 ~ 6 are not necessarily 
the most massive halos at 2 = 0, and a minority does not even pass the threshold imposed for qualifying as a 2 = cluster. 



order to facilitate the formation of a supermassive (~ 10^ Mq) by 
z ^ Q 'm the first place, it may be required to have a rare pair of 
dark matter halos (-^ 10^"^ Mq) in which the intense UV radia- 
tion from one halo prevents fragmentation of the other so that the 
gas collapses directly into a supermassive blackhole. This would 
constrain the QSOs to lie in even richer environments. 



7 RECOMMENDATIONS FOR FUTURE OBSERVATIONS 

The predicted large-scale distributions of 1775 -dropouts and Lya 
emitters as shown in, e.g., Figs. [8l [TS] and [T6l show evidence for 
variations in the large-scale structure on scales of up to ~l-2°, far 
larger than currently probed by deep HST or large-area ground- 
based surveys. A full appreciation of such structures could be im- 
portant for a range of topics, including studies of the luminosity 
function at 2; ~ 6 and studies of the comparison between ACDM 
predictions and gravitational clustering on very large scales. The 
total area probed by our simulation is a good match to a survey 
of ~ 20 degree^ targeting 1775 -dropouts and Lya emitters at z ~ 
6 planned with the forthcoming Subaru/HyperSurpimeCam (first 
light expected 2013; M. Ouchi, private communications, 2008). 



We found that the 1775 -dropouts associated with proto-clusters 
are almost exclusively found in regions with positive density en- 
hancements. A proper understanding of such dense regions may 
also be very important for studies of the epoch of reionization. 
Simulations suggest that even though the total number of ionizing 
photons is much larger in very large proto-cluster regions cover- 
ing several tens o f comovi ng Mpc as compa red to the field (e.g. 
ICiardietal.ll2003l but see Illiev etalJ ( l2006h ). they may still be 
the last to fully re-inionize, because the recombination rates are 
also much higher. If regions associated with QSOs or other struc- 
tures were to contain significant patches of neutral hydrogen, this 
may affect both the observed number densities and clustering of 
LBGs or Lyg emitter s relative to our assumed mean attenuation 
dMcOuinn et al .l2007h . However, since our work mostly focuses on 
z ~ Q when reionization is believed to be largely completed, this 
may not be such a n issue compared to surveys that probe earli er 
times at z > 7 (e.g. lKashikawa et aI.ll2006l : lMcOuinn et al.ll2007h . 

Our evaluation of the possible structures associated with 
QSOs leads to several suggestions for future observations. While 
it is unlikely that the Wide Field Camera 3 (WFC3) to be in- 
stalled onboard HST in early 2009 will provide better contraints 
than HST/ACS due to its relatively small field-of-view, we have 
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shown that either by surveying a larger area of ~10' x 10' or by go- 
ing deeper by ~1 mag in zsso, one significantly reduces the shot 
noise in the neighbour counts allowing more reliable overdensi- 
ties and (lower) limits on the halo masses to be estimated. A single 
pointing with ACS would require ~ 15—20 orbits in zgso to reach a 
point source sensitivity of 5a for a ^850=27. 5 mag object at z ~ 6. 
Given the typical structure sizes of the overdense regions shown 
in Fig. [19] a better approach would perhaps be to expand the area 
of the current QSO fields by several more ACS pointings at their 
present depth of 2850=26.5 mag for about an equal amount of time. 
However, this may be achieved from the ground as well using the 
much more efficie nt wide-field detecto r systems. Although this has 
been attempted bv lWillottetalJ l l2005h targeting three of the QSO 
fields, we note that their achieved depth of Z850=25.5 was probably 
much too shallow to find any overdensities even if they are there. 
We would like to stress that it is extremely important that fore- 
ground contamination is reduced as much as possible, for example 
by combining the observations with a deep exposure in the Vsoe 
band. This is currently not available for the QSO fields, making 
it very hard to calculate the exact magnitude of any excess counts 
present. While a depth of 2850 =27.5 mag seems out of reach for 
a statistical sample with HST, narrow-band Lya surveys targeting 
the typically UV-faint Lya emitters from the ground would be a 
very efficient alternative. Although a significant fraction of sources 
lacking Lya may be lost compared to dropout surveys, they have 
the clear additional advantage of redshift information. Most Lya 
surveys are carried out in the atmospheric transmission windows 
that correspond to redshifted Lya either at z ~ 5.7 or 2 ~ 6.6 for 
which efficient narrow-band filters exist. We therefore suggest that 
the experiment is most likely to succeed around QSOs at 2 ~ 5.7 
rather than the QSOs at 2 ~ 5.8 — 6.4 looked at so far. It is, how- 
ever, possible to use combinations of, e.g., the 2 ~ 5.7 narrow- 
band filter with medium or broad band filters at ~9000A to place 
stronger constaints o n the photometric redshifts of 1775 -dropouts in 
QSO fields (e.g., see lAiiki et al]|2006h . 

In the next decade, JWST will allow for some intriguing fur- 
ther possibilities that may provide some definite answers: Using 
the target 0.7-0.9/xm sensitivity of the Near Infrared Camera (NIR- 
Cam) on JWST we could reach point sources at lOa as faint as 
2850=28.5 mag in a 10,000 s exposure, or we could map a large 
~10' X 10' region around QSOs to a depth of 2g50 =27.5 mag within 
a few hours. The Near Infrared Spectrograph (NIRSpec) will allow 
>100 simultaneous spectra to confirm the redshifts of very faint 
line or continuum objects over a >9 arcmin^ field of view. 



8 SUMMARY 

The main findings of our investigation can be summarized as fol- 
lows. 

• We have used the A'^ -body plus semi-analytic modeling of 
iDe Lucia & BlaizotI ( |2007|) to construct the largest (4° x 4°) mock 
galaxy redshift survey of star-forming galaxies at 2 ~ 6 to date. 
We extracted large samples of 2775 -dropouts and Lya emitters 
from the simulated survey, and showed that the main observational 
(colours, number densities, redshift distribution) and physical prop- 
erties (M*, SFR, age, Mhaio) are in fair agreement with the data as 
far as they are constrained by various surveys. 

• The present-day descendants of 4775 -dropouts (brighter than 
M'^Y,z^&) are typically found in group environments at 2 = 
(halo masses of a few times 10^'^ Mq). About one third of all 2775- 
dropouts end up in halos corresponding to clusters, implying that 



the contribution of "proto-cluster galaxies" in typical 4775 -dropout 
surveys is significant. 

• The projected sky distribution shows significant variations in the 
local surface density on scales of up to 1 ° , indicating that the largest 
surveys to date do not yet probe the full range of scales predicted 
by our ACDM models. This may be important for studies of the lu- 
minosity function, galaxy clustering, and the epoch of reionization. 

• We present counts-in-cells frequency distributions of the number 
of objects expected per 3.4' x 3.4' HST/ ACS field of view, finding 
good agreement with the GOODS field statistics. The largest pos- 
itive deviations are due to structures associated with the seeds of 
massive clusters of galaxies ("protoclusters"). To guide the inter- 
pretation of current and future HST/ ACS observations, we give the 
probabilities of randomly finding regions of a given surface over- 
density depending on the presence or absence of a protocluster. 

• We give detailed examples of the structure of proto-cluster re- 
gions. Although the typical separation between protocluster galax- 
ies does not reach beyond ~10' (25 Mpc comoving), they sit in 
overdensities that extend up to 30' radius, indicating that the proto- 
clusters predominantly form deep inside the largest filamentary 
structures. These regions are very similar to two proto-clusters of 
1775 -dropo uts or Lya emitter s found in the SDF l lOta et af]|2008l) 
and SXDF ( iOuchi et alj2005l) fields. 

• We have made a detailed comparison between the number counts 
predicted by our simulation and those measured in fields observed 
with HST/ ACS towards luminous 2 ~ 6 QSOs from SDSS, con- 
cluding that the observed fields are not particularly overdense in 
neighbour counts. We demonstrate that this does not rule out that 
the QSOs are in the most massive halos at 2 ~ 6, although we can 
also not confirm it. We discuss the possible reasons and implica- 
tions of this intriguing result (see the Discussion in Section 6). 

• We give detailed recommendations for follow-up observations 
using current and future instruments that can be used to better con- 
strain the halo masses of 2 ~ 6 QSOs and the variations in the 
large-scale structure as probed by 1775 -dropouts and Lya emitters 
(see Section 7). 
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Table 1. Overview of i-dropout surveys. 



Field Name 




Survey Area 


z-band detection limit" 


Reference 






(ai'cmin^ ) 


(AB mag) 




MR mock 




70,000 


~27.5 


This paper 


HUDF 




11.2 


~29.2(10o-,0'.'2) 


Bouwens et al. (2006) 


HUDF05 




20.2 


~28.9 (5o-,0'.'2) 


Bouwens et al. (2007): Oesch et al. (2007) 


HUDF-Ps 




17.0 


-28.5 (10o-,0'.'2) 


Bouwens et al. (2006) 


GOODS 




316 


~27.5 (10o-,0'.'2) 


Bouwens et al. (2006) 


ACS/GTO 




46 


~27.3 (6o-,l^'5) 


Bouwens et al. (2003) 


SDF 




876 


~26.6 (3o-,2''0) 


Kashikawa et ah (2004) 


SXDF 




--4,680 


-25.9 (5(T,2('0) 


Ota et al. (2005} 


UKIDSS UDS + SXDF 




--2,160 


-25.0 (5(T,2"0) 


McLure et al. (2006) 


QSO SDSSJ0836+0054 (z 


= 5.82) 


11.5 


~26.5 (5cr,0"2) 


Zhena et al. (2006); Aiiki et al. (2006) 


QSO SDSSJ1306+0356 (z 


= 5.99) 


~11.5 


~26.5 (5cr,0{'2) 


Kim et al. (2008) 


QSO SDSSJ 1630+40 12 (z 


= 6.05) 


~11.5 


~26.5 (5cr,0"2) 


Kim et al. (2008) 


QSO SDSSJ1048+4637 (z 


= 6.23) 


~30 


~26.2 (3cr,l"5) 


Willott et al.^ (2005) 






~11.5 


~26.5 (5ct,0"2) 


Kim et al. (2008} 


QSO SDSSJ 1030+0524 (z 


= 6.28) 


~30 


~26.2 (3cr,l('5) 


Willott et al. (2005) 






11.5 


~26.5 (5o-,0"2) 


Stiavelli et al. (2005): Kim et al. (2008) 


QSO SDSSJ1148+5251 (z 


= 6.43) 


~30 


~26.2 (3o-,iy5) 


Willott et al. (2005) 






~11.5 


~26.5 (5o-,0^'2) 


Kim et al. (2008) 



" The numbers between parentheses correspond to the significance and the diameter of a circular aperture. 
Table 2. i-dropout surface densities in the MR mock survey and observations. 



Surface Density (arcmin ^ ) 



Magnitude 


MR 


MR 


MR 


MR 


MR 


B07" 


O08" 






(total area) 


(876 arcmin^) 


(320 arcmin^) 


(160 arcmin^) 


(11.5 arcmin^) 






z' 


< 27.50 


2.31 


2.36 ±0.31 


2.31 ±0.45 


2.31 ±0.52 


2.28 ± 0.98 


2.18 ±0.23 




z' 


< 27.00 


0.64 


0.62 ± 0.11 


0.63 ±0.15 


0.64 ±0.17 


0.63 ± 0.38 


0.83 ±0.09 




z' 


< 26.50 


0.18 


0.17 ±0.03 


0.18 ±0.04 


0.16 ±0.06 


0.18 ± 0.15 


0.33 ±0.04 


~0.18 


z' 


< 26.00 


0.08 


0.08 ± 0.01 


0.08 ±0.02 


0.08 ±0.03 


0.08 ± 0.09 


0.10 ±0.02 


~0.11 


z' 


< 25.50 


0.04 


0.04 ±0.01 


0.05 ±0.01 


0.04 ± 0.02 


0.04 ± 0.06 


0.03 ±0.01 


~0.04 


z' 


< 25.00 


0.03 


0.03 ±0.01 


0.03 ±0.01 


0.03 ±0.01 


0.03 ± 0.05 


0.003 ± 0.003 


~0.01 



"Observed surface densities from lBouwens et al] j2007l) and lOtaetai]j2008l) . 



